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Abstract 

Coping with outliers contaminating dynamical processes is of major importance in various appli- 
cations because mismatches from nominal models are not uncommon in practice. In this context, the 
present paper develops novel fixed-lag and fixed-interval smoothing algorithms that are robust to outliers 
simultaneously present in the measurements and in the state dynamics. Outliers are handled through 
auxiliary unknown variables that are jointly estimated along with the state based on the least-squares 
criterion that is regularized with the £i-norm of the outliers in order to effect sparsity control. The resultant 
iterative estimators rely on coordinate descent and the alternating direction method of multipliers, are 
expressed in closed form per iteration, and are provably convergent. Additional attractive features of 
the novel doubly robust smoother include: i) ability to handle both types of outliers; ii) universality 
to unknown nominal noise and outlier distributions; iii) flexibility to encompass maximum a posteriori 
optimal estimators with reliable performance under nominal conditions; and iv) improved performance 
relative to competing alternatives at comparable complexity, as corroborated via simulated tests. 
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I. Introduction 

Estimating the state of dynamical systems is of paramount importance in various applications including 
tracking and navigation. A major challenge in these applications is deviation from nominal conditions, 
which gives rise to outliers in the observations and state dynamics. Outliers in the state may come from 
abrupt changes in the target position due to, e.g., unexpected turbulence, and velocity variations due to 
target maneuvering. Outliers in the observations typically occur because of clutter, and glint noise ll26l . In 
addition, both types of outliers can arise after linearizing the emergent nonlinearities, as in the extended 
Kalman filter (EKF) 01, f24ll . The clairvoyant Kalman filter (KF) and smoother (KS) can not handle 
state and/or measurement outliers |]3j , ||27~I . because both can be viewed as minimizers of a weighted 
least-squares (WLS) criterion, which is known to be sensitive to outliers ll22l . 

Robustification of KF and KS dates back to the '70s f27ll . but remains an active area of research until 
today [331, EH, continuously leveraging advances in convex optimization Q, Q. Despite these advances, 
existing robust KF and KS approaches have several limitations. Some consider outliers only in the 
measurements |[33l . while others can handle either type of outliers alone but not both simultaneously ETl . 
Most approaches capitalize on robust e.g., M-estimators 11351 . which rely on Huber's and other outlier- 
resilient criteria ||20"1 App. A6.8]. They require knowledge of the nominal distribution, and are effective 
only when the nominal noise is independent across observations and state entries ll23l Chap. 7]. In the 
presence of correlated Gaussian noise, pre-whitening yields independent noise entries, which is required 
for M-estimates to be applicable IT351 . However, pre-whitening spreads the outliers to non-contaminated 
measurements. Approaches to doubly robust fixed-lag smoothing rely on heuristics to determine whether 
outliers are present in the state or the measurement equation [35]. 

A recent scheme for robust fixed-interval (but not fix-lag) smoothing is reported in Q, treating non- 
linearities in the state and measurement equations separately from robustness issues. In the development, 
nonlinearities are linearized, and the measurement noise is assumed to follow the so-termed ^i-Laplacian 
(or a Huber) distribution parameterized by a matrix R. The choice of R (and likewise that of Huber 
thresholds) critically affects smoothing performance, but systematic means of selecting these parameters 
was left open in Q. Finally, a class of robust schemes popular in computer vision for linear regression 
settings comprises the so-termed random sample consensus (RANSAC)-based algorithms lfl4l . EDI . 

If the outlier distributions are known and the model is linear and Gaussian (when conditioned on the 
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outliers), efficient sequential Monte Carlo (SMC) smoothers based on Rao-Blackwellization (9[ as well 
as deterministic algorithms based on pruning techniques, such as the interacting multiple model (IMM) 
method ifTTTl . will offer viable alternatives. Unfortunately, accurate description of the outlier distribution 
can be hard to obtain in practice. In addition, the complexity of SMC methods can be prohibitive for 
medium-to-large size problems due to the curse of dimensionality |[T2l . 

In the present work, outliers are handled through auxiliary unknown variables that are jointly estimated 
along with the state. The resultant estimators rely on constraining the degree of outlier scarcity through t\- 
norm regularization, which is imposed on the auxiliary variables to enable sparsity control. The proposed 
robust smoothers: i) can handle both types of outliers simultaneously (hence referred to as doubly robust); 
ii) are universal, meaning they can operate even when the distributions of the nominal noise and outliers 
are unknown; iii) possess maximum a posteriori (MAP) optimality under specific assumptions on the 
outlier and nominal noise distributions; iv) perform well under nominal conditions (i.e., with no outliers 
present); and v) outperform RANSAC- and Huber-based robust smoothers. 

Unlike ordinaiy KS, the novel robust estimators are nonlinear functions of the data, and rely on the 
alternating direction method of multipliers (AD-MoM) or coordinate descent iterations. Closed-form 
expressions render the bulk of complexity per iteration comparable to that of KS, which is linear in the 
observation time. Few iterations of the coordinate descent or AD-MoM-based algorithms are required in 
practice to obtain satisfactory results. Numerical tests demonstrate that the developed methods can reject 
state and measurement outliers, and outperform RANSAC and Huber-based techniques. 

The rest of the paper is organized as follows. Section II contains preliminaries and the problem 
statement. Fixed-interval doubly robust smoothing (DRS) is introduced in Section III, where the link 
between robustness and sparsity is also established. Selection of the regularization parameters is the 
subject of Section IV. The coordinate descent based DRS algorithm is developed in Section V. Fixed-lag 
DRS is dealt with in Section VI. An alternative formulation for general linear state-space models is 
developed in Section VII. Simulations are presented in Section VIII, and conclusions in Section [DD 
Notation: Column vectors (matrices) are denoted with lower- (upper-) case boldface letters; (-) T stands 
for transposition; Otv is the JV x 1 column vector with all zeros; and I at is the N x N identity matrix. 
Given a set S C M. N , the indicator function is defined as ls(x) = 1 if x € S, and ls(x) = 0, otherwise. 
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II. PROBLEM STATEMENT AND PRELIMINARIES 
Consider the following outlier-aware state-space model 

x n = F n x„_i + w n + o x , n , n = l,...,N (la) 
y„ = H n x„ + v n + Oj, )Tl , n = l,...,iV (lb) 

where x n € It * and y n E jR^*" denote the state and measurement vectors at time n, respectively; 
w n and v n are mutually independent, zero-mean nominal noise vectors, each independent across time, 
and from the initial state xo, with respective covariance matrices {Q„, R n }^ =1 ; xo has mean mo and 
covariance So; and {ox,n,Oy,n}n=i represent the unknown state and measurement outlier vectors. 

Given {F n , H„, Q„, R n , y n }„ =1 , mo, and So, the goal of fixed-interval DRS is to estimate {x n }^ =1 
and {o x n , Oy tn }n =1 . Different from J3], E71 . 11331 . ll35l . note that the outliers are explicitly introduced and 
treated as unknown variables to be estimated. This problem can be cast as one of linear regression, since 
x n = F„x n „i+o Zjn +w n can be viewed as an extra "zero measurement" = — x n +F„x n _i+o a;)ri +w n ; 
and similarly for the initial condition as —mo = — xo + wo, where wo is zero-mean with covariance So- 
Thus, £[)) can be expressed in a matrix-vector form as 



-I 









w 




-m 


Fx -I 




xo 




O x ,l 




Wi 











Xl 














F N -I 




x 2 


+ 


O x ,N 


+ 


W N 







Hi 












Vl 




yi 




















H^v 




°y,N 




Vat 




YN 



or in a more compact form (with obvious definitions) as 

Ax + o + w = y (3) 

where matrix A is tall, and vector w has block diagonal covariance matrix Q w := diag(So, Qi, . . . , Qn, 
Ri, . . . , Rat). Since both x and o are unknown, the linear system in © is clearly under-determined. 
When there are no outliers (cf. o = 0) and A is full rank, the WLS estimate [cf. ©] x := arg min x (y— 
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Ax) T Q^ 1 (y — Ax) yields the KS. Substituting from ©, this estimate can also be written as CD P- 189] 

1 N 1 1 r 

x KS : = arg min - ^ ||y n - H n x n |j^-i + -||x - m |||-i + - ||x n - F n x n _i||^_i (4) 

n=l n=l 

where Hx)^ := x T Mx. The estimate x KS is also known as the Rauch-Tung-Striebel (RTS) smoother 
|[30l . It is both minimum mean-square error (MMSE) and maximum a posteriori (MAP) optimal if the 
initial state and all nominal noise vectors are Gaussian; otherwise, it is linear (L)MMSE optimal. In fact, 
adding to the WLS cost in © a ridge regularization term A||x||| to constrain the £2 norm of x, the 
resultant ridge WLS, as well as the (L)MMSE and MAP, all yield a unique estimate (even for under- 
determined models), and can be rendered equivalent depending on the assumptions and corresponding 
optimality claims one is willing to make. The exposition henceforth is centered around the (regularized) 
WLS approach, because it is universal with respect to (wrt) the underlying probability density functions. 

With o = (or known for that matter), the state can be clearly estimated by solving the equations 
(A T Q~ 1 A)x = A T Q~ x (y — o), where the matrix A T Q~ 1 A has a block tridiagonal structure [cf. 
© and ©]. This allows obtaining the solution in batch form at complexity which is linear in N (19\ 
p. 174]. Alternatively, one can use the forward-backward algorithm in e.g., 0] p. 189] or 11301 to solve 
© recursively. The forward direction is a KF followed by the backward run, which smooths the filtered 
estimates. The forward-backward algorithm also exhibits linear complexity in N. In a nutshell, both batch 
and recursive solvers of ©-© exhibit low complexity (linear in N) when o is known. 

If unknown outliers o are present in ©, and one chooses to ignore them and run a clairvoyant KS 
as if o were absent, the MSE performance will be poor because the (W)LS criterion is known to be 
severely affected by outliers (23]. This mandates dealing with the outliers in © explicitly - a challenge 
addressed in the next section by exploiting sparsity constraints on o. 

III. Robustness by controlling Outlier Sparsity 

The under-determinacy in © when o is unknown, raises non-uniqueness and thus state identifiability 
issues. Ridge WLS, (L)MMSE, and MAP estimators cannot recover the exact x, a fact confirmed by 
the nominal-noise-free setup [cf. w = in ©], where one faces an under-determined system of linear 
equations generally admitting infinite solutions. Key to addressing this issue is the degree of sparsity 
(number of nonzero entries) of the vector o - an attribute offering the potential for solving uniquely 
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under-determined systems of linear equations, as established recently in the context of compressive 
sampling IfTUI . This motivates recovery of a controllably sparse estimate of o by effecting sparsity through 
an £o-(pseudo)norm regularization term. Specifically, the proposed robust smoother aims at 

[x,6]:=arg min -||y - Ax - o||i_i + A||o|| (5) 

x,o Z 

where the scalar A is used to control the degree of sparsity in o. The level of outlier sparsity can be 
selected by tuning A, and the outliers can then be estimated jointly with the state via ([5]). Unfortunately, 
the lQ-novm renders the problem non-convex and in fact NP-hard, which suggests a convex relaxation 
using the closest convex approximation to the l^-novm, namely the £i-norm |[T0l , ||37l . 

Using an £i-norm regularization and defining o x := [o^ 1; . . . , N ] T , o y := [o yl , . . . , N ] T , the 
novel DRS approach amounts to [cf. £[])-©] 



fl N 1 N 

[x^"", 'o x ,'o y \ := arg mm { - y~] \\y n - H n x n - o^L-i + - ^ ||x n - F n x n _i - o^U" 

o^,o y I Z n Z 

\ 71=1 71 = 1 

N \ 

--i + [Axllo^^Hi + Aylloy^Hi] > 

n=l J 



+^||x - m ||^-i + > ^ [A x ||Oa., n ||i + A„||o w , n ||i] } (6) 

where X x and \ y are introduced in (l6l i to allow individual control of sparsity levels in o x n and Oy n . 
Viewing the cost in © as a Lagrangian function, allows casting this unconstrained minimization problem 
as a constrained one. Indeed, sufficiency of the Lagrange multiplier theory implies that O Sec. 3.3.4]: 
using the solution o x ,o y of © for given multipliers \ x , X y > and letting t x := ||o x ||i, r y := ||6j,||i, 
the equivalent constrained minimization problem entails the WLS cost (quadratic terms in ©) subject 
to the constraints ||o x ||i < t x , and Ho^Hi < r y . Note however, that \ x (\y) in ©, and likewise r x {T y ) in 
its constrained equivalent, are tuning parameters and not optimization variables. 

The DRS state estimate in © can cope with outliers jointly present in the state and in the measurements. 
In addition, it is universal because it does not require knowing the distribution of the nominal noise or 
the outlier vectors. (The choice of \ x and X y discussed in the next section will not follow from the 
distribution of a contaminating model but will be data driven.) Different from and ||39l which enforce 
sparsity in the state, DRS controls sparsity in the outliers to effect robustness. At this point, it is worth 
recalling that o in smoothing dynamical processes is indeed sparse, since it models abrupt changes (target 
maneuvers) in the state which cannot be too many in the analysis window, and glint noise giving rise to 
large-magnitude observations which occur rarely too. Having explained why it is meaningful to expect 
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only few nonzero entries in o, it is also useful to clarify that this is not necessary. (Simulated tests in 
Section VIII will allow for outlier contamination as high as 80%.) Although smoothing performance 
degrades as the number of nonzero entries in o increases, all the proposed approach needs is a handle 
on the percentage of outliers without requiring this percentage to be necessarily low. 

The WLS cost can be also replaced by other functions (e.g., the £i-norm of the error), and alternative 
regularization terms (e.g., the ^2-norm of the outliers) can be employed instead of, or, in addition to 
the ^i-norm |[T8l . Non-convex costs and regularizes are also possible, but they are not recommended as 
stand-alone solvers of © because they cannot guarantee convergence to the global optimum. In contrast, 
it will be seen in Sections V and VI that ((6]) and variants involving t\ and £2 norms can afford not only 
globally convergent but also computationally efficient solvers. 

Having clarified that the ensuing developments will rely on ©, which is meaningful regardless of 
the {w, v, 0} distributions, it is natural to ask the following question: Under what assumptions on these 
distributions can one claim MAP optimality of the resultant state and outlier estimators? The ensuing 
proposition (proved in Appendix IA1 asserts that this is possible if the nominal noise vectors are Gaussian 
and the additive outliers are known to be Laplacian distributed. 

Proposition 1. Suppose w n and v n are Gaussian distributed, mutually independent, and independent from 
o x ,n and Oy jTl , respectively. Furthermore, assume o x ,n has Laplacian distributed entries o xn< i that are 
independent from past states, past state outliers, measurement outliers, and across different dimensions; 
that is, o x ^ n ^ and o xn $ are independent for d 7^ d! . Similarly, o y n has Laplacian distributed entries 
°y,n,d that are independent from past states, past measurement outliers, state outliers, and across different 
dimensions; then the estimators obtained as in (f6]) are MAP optimal. 

Albeit simple to prove, the usefulness of Proposition 1 is twofold: (a) it allows for a side-by-side 
comparison with the MAP optimality offered by the clairvoyant KS in |4|); and (b) it positions the 
proposed approach in the context of related MAP-optimal schemes adopting i\ -error based smoothers; 
see e.g., 121 and references therein. Specifically, different from the multivariate Laplacian in Proposition 
1 described by the two scalar X x and X y parameters, the l\ -Laplacian model in Q entails a D y x D y 
matrix of parameters that are assumed known. 

Next, robustness of the estimators © is established. Specifically, the ensuing proposition proved in 
Appendix |B] shows that DRS subsumes Huber's M-estimator as a special case. 
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Proposition 2. When {Q„, R n }^ =1 and are all identity matrices, the DRS in ([6]) boils down to solving 
the following Huber M-estimator problem 



x := arg mm \Y^Y^P\ V (y n ,d ~ h^ d x n ) + ^ 

=1 n=l d'=l 



N 

(x 04 , - m 04 ,) 2 + ^ PxA x n,d' - f M' x «- 



n=l 



(7) 



where y n := [y nA , j/„ ;2 , y n ,D y ] T , x„ := [x nA , x n>2 , ■ ■ ■ , ^n,Z)J T . x := [xf , x^, . . . , x^] T , H n := 
[h ni i, . . . , h n)jD J T , F n := [f n) i, . . . , f n ,D x ] T , and p\ denotes the Huber cost ft23§ 



p x (r) := < 



|r 2 , if \r\ < X 

Air I — 4 1 , otherwise. 



Proposition 2 generalizes to dynamical systems the link between (O and (0 established in |[T7l for 
linear regression models. As a result, DRS also inherits the robustness attributes associated with the 
Huber M-estimators. Figure [T] depicts Huber's cost along with the quadratic one. For small residuals 
(i.e., |r| < A), p\(r) coincides with the quadratic one. But for \r\ > A, Huber's cost grows only linearly 
with r, which allows for down-weighting large errors. Therefore, outliers which are responsible for large 
errors will be weighted less in the overall objective function. Clearly, for large values of A, Huber's cost 
coincides with the quadratic one. As a consequence, a large number of outliers in the observations and 
state is effected through small X y and X x , respectively. Finally, it should be mentioned that the Huber 
function is not the only one enabling robustness. A gamut of related robust costs can be found in e.g., |[20l 
Appendix A6.8] with different properties. The most convincing reason for exploiting sparsity constraints 
under the ^i-norm of the outlier vectors is to leverage recent advances on compressive sampling to 
develop the computationally efficient and globally convergent solvers presented in Section V. 

While DRS inherits the robustness features of Huber's M-estimator, it enjoys several advantages over 
it, as detailed in the following two remarks. 

Remark 1. As mentioned earlier, the universality of DRS pertains also to the regularization term. If 
outliers are present in all entries of x n or y n , this form of group sparsity can be effected by replacing 
1 1 1 1 1 an d ||o Zjn ||i in © with Ho^l^ and ||o Xin ||2, respectively. Using the latter regularization, either 
°y,n = ®D y (Ox,n = 0/) x )> or oil the entties of o y ^ n (o x>n ) are nonzero, signifying the presence of outliers 
in all measurement (state) variables at time n. The cost function resulting from £2 -norm regularization 
is still convex BTI . and its minimization can be carried out using solvers similar to those of © to 
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be presented in Sections V and VI. For this reason, only £i-norm regularization will be considered 
henceforth. 

Remark 2. In addition to universal robustness, the novel approach to DRS is also flexible in three counts. 
First, Huber estimators fix X x or X y a fortiori based on knowledge of the nominal distribution and the 
contamination model, e.g., for the e-contaminated class with Gaussian nominal, it follows that X x = X y = 
1.345 |[23l . In contrast, DRS does not assume any specific model for the outliers' distribution. From 
this viewpoint, M-estimators are subsumed by the present formulation as special cases corresponding 
to specific values of X x and X y . In addition, DRS can accommodate colored noise [cf. ©], which is 
formidable for the robust estimators of lETI and ll35l because pre-whitening in §7} with Q^ ' 5 and R^ 5 
spreads the outliers to non-contaminated measurements. Finally, DRS not only allows one to apply KS 
on outlier-free data but also reveals the outliers - a feature not available to Huber-based approaches, 
which only implicitly incorporate the outliers. 

The next section presents systematic means of adjusting X x and X y to accommodate fully nominal 
settings (i.e., no outliers), fully contaminated scenarios, and all cases in between, even when the degree 
of contamination is unknown. 

IV. Selecting X x and X y 

Parameters A x . and A^ control the level of sparsity in the estimated outlier vectors, and their judicious 
selection is crucial for the successful operation of DRS. Too large a value for these parameters reverts 
DRS back to the KS, which is non-robust. On the other hand, very small values give rise to many 
spurious state and measurement outliers, thus degrading DRS performance. Standard cross-validation 
techniques IlTTI . are not effective when outliers are present ll25l . Toward choosing proper values of A^ 
and X y , the next proposition provides computable bounds so that if X y > X y and X x > X x , then DRS 
coincides with KS. (See Appendix ICl for the proof.) 

Proposition 3. The DRS estimate in ((6]) coincides with KS estimate x KS if 

X y >Xy := max ||R^ 1 (y n -H n ,^ s )|| (8a) 
X x >X X := max llQ^f-FnX^))! (8b ) 

l<n<N 11 

Having established the upper bounds in d8]), desirable values for A x and X y will be points in the rectangle 
[0, X x ] x [0, X y ]. Consider a two-dimensional grid on this rectangle and a properly chosen cost generated 
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by each grid point. Depending on the information available to the designer, the "best" A x and X y will be 
those values minimizing either one of two costs presented in the ensuing subsections. 

A. Known percentage of outliers 

Here the percentage of (non-)zero entries of the outlier vectors is assumed (at least approximately) 
known; denote them as tt 0jX and n 0> y. Consider the 2-D grid on [0, X x ] x [0, Ay], comprising I x points 
along the X x axis and I y points along the \ y axis. Let (i x , i y ), with 1 < i x < I x and 1 < i y < I y be the 
grid point corresponding to values X x (i x ) and X y (i y ). For the given (i x ,i y ), solve © with X x = X x (i x ) 
and X y = X y (i y ), to obtain 5t(i x ,i y ),o x (i x ,i y ) and o y (i x ,i y ). With supp(x) representing the non-zero 
entries of x, and |x| the number of entries of x, the "best" X x (i*) and X y (i y ) are found as those with 

\supp(o y (i x ,iy))\ 



[i*,i*] := argmin 



\supp(o x (i x ,i y ))\ 



+ 



\®y tyx j iy) I 



(9) 



I (ix j iy) | 

Finding X x (i x ) and X y (i y ) as in (O, appears to require solving (|6) for all pairs (i x ,i y ) of the two- 
dimensional grid. The associated computational cost can be viewed as the "price paid" for the universality 
attribute of DRS elaborated in Section III. Instead of two, (A x ., X y ), recall that the number of parameters 
(and thus dimensionality of the search space had those been unknown) in is 0(Dy). Of course, this 
is not an issue in |31] where these parameters are assumed known. 

Fortunately, the special structure of the optimization problem in © allows for solvers at complexity 
lower than running I x I y robust smoothers, one per (X x , X y ) point on the grid. Indeed, © can be formulated 
as a quadratic program (QP), and its form can leverage recent advances in computing the so-termed least- 
absolute shrinkage and selection operator (Lasso), originally developed for static linear regressions; see 
e.g., EH . As will be detailed in Section V, Lasso can be also exploited for the DRS dynamical model 
considered here. General-purpose QP solvers incur polynomial complexity up to 0(Z? 3 ' 5 ) per iteration, 
where D is the number of optimization variables involved [8]; here, D = N(2D X +D y + 1). The reduction 
to 0(D) per iteration afforded by Lasso-based solvers becomes possible by starting from A := (X x ,X y ) 
(sparsest initialization) and solving successively over decreasing A-points on the grid, using coordinate 
descent iterations. Qualitatively speaking, about one nonzero entry of o emerges per A-point on the grid, 
and its value is used to initialize the iteration for the next point on the grid (warm start) ITT31 . lERJI . 
Especially for large problem dimensions (D 3>), it has been demonstrated that such Lasso solutions 



January 20, 2013 



DRAFT 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (REVISED) 1 1 

for the entire so-termed regularization path (corresponding to all A-points on the grid), can be more 
computationally efficient than solving Lasso even for a single fixed point A on the grid; see also IT161 . 



B. Known covariance of nominal noise vectors 

The key observation here is that if the estimates o x>n (i x ,i y ) and o y ^ n (i X: i y ) are accurate, then x n (i x , i y )- 
(i x ,i y ) should have the same statistics as w n ; and likewise, the statistics of 
y n — H n 5t n (i x , i y ) — o yin (i x ,i y ) should coincide with those of v n . Focusing for instance on second-order 
statistics, if these estimated residuals are pre-whitened (by left-multiplication with Q^ ' 5 and R~ 0,5 ), 
they should have zero mean and unit variance. Thus, upon pre-whitening and averaging, their sample 
variance should approach 1. As a consequence, the "best" X x (i x ) an( ^ \{iy) we found as those with 

= argmin \l - a 2 e (i x ,i y )\ (10) 



i^xi iy] 



+ || w n(*xi iy) IIq-i 

&e{ixiiy) : = 



where 



ND y + (N + l)D x 

^niixiiy) • — Yn H n x n {i x , i y ) o y ^ n (i X ji y ) 

:= ~x. n (i X iiy) F n x n _i(i x . , i y ) o xn (z a; , i y ) 

wo(ix,i y ) ■= %){i x ,i y ) - m . 

The number of grid points I x and I y should be chosen large enough to ensure that a point in the vicinity 
of the global minimum of ( fTOl is obtained. The grid need not be uniform. Indeed, simulations confirm 
that the search is more efficient if grid points are chosen on the log scale; see also ifToll . This parameter 
tuning method, will be henceforth referred to as absolute variance deviation (AVD). Since DRS in © 
requires knowledge of nominal noise covariances, the AVD scheme needs no additional assumption; and 
similar to the method of the previous subsection, it can capitalize on Lasso coordinate descent based 
schemes to lower the computational complexity of solving © per grid point, as detailed next. 

V. DRS via Coordinate Descent 

While general purpose QP solvers can be utilized to solve © with polynomial complexity in N, their 
complexity can still be too high when is large. A reduced-complexity alternative is developed in this 
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12 



section to solve ((6]) using block coordinate descent iterations. Letting C(x, o x ,o y ) denote the cost in © 
and j indexing coordinate descent iterations, the following sub-problems are solved per iteration j and 
coordinate dimension d 



1® = argmin 0(^,0^,0^] 



x,n,d 



arg min C(x^ , o£] o^ n _ x , o^ J n>1:d _ v o x>n4 , o^J +1 . Dx , 



Cj-i) 



O x ,n,d 



o^'- 1 ) (j-i) 



(11a) 



(lib) 



p) - arffm i n (7f x 0) o 0') Q (J) J3) (i) „ , o 0'-i) Q (i-i) (i_1) ) Cllc) 

Vy.n.d 



where (11 lb| ) is solved for n = 1, . . . , N and d = 1, . . . , D x , while d 1 lcb is solved for n = 1, . . . , JV and 
d = 1, . . . , D y . The initial conditions are = Ond t and = Ond v - 
The optimization in (II lab can be explicitly written as 

2 



x 



Cj) 



arg mm 



1 N 



H„x„ - o 



O-i) 



1j.1t 



n=l 



1 N II 



Cj-i) 



n=l 



+ - x -m L- 

q- 1 2 ^0 



(12) 



Solving (1121) is equivalent to finding the KS estimate for a system with outlier-compensated measurements 

(j—i) (i—l) 
y n —Oy,n , and outlier-compensated state x n — Ox, n ■ Therefore, either the batch or the forward-backward 

recursive algorithms reviewed in Section II can be adopted to solve (fl2T) with linear complexity in N. 

Focusing on (lllbl ). one should solve 



U) ■ 1 



x 0')_ F x (j) - 



O 



Cj) 



x.n.l.d— 1 



Ox,n,d 
J x,n,d+1:D X 



(13) 



Qn 1 



for every d = 1, . . . , Ac and re = 1, . . . , N. The scalar problem (TT3T > is solved using the Lasso, which 
can afford a closed-form solution [2~Q. Indeed, ([T3l can be equivalently expressed as 



'x,n,d ^ arg ™ m d 2 ^> n ' d " 



S3) 



^~ ^x,n,d I O x n,i 



(14) 



where 



C0 := 

' x,n,d 



1 



Qn,d,d 



d-l 



fc=i 



fc=ci+l 



2n,k,d°x,n,k 1 



a 



n 1 Cx (j) - F x (i) 



^x,n,d • — / Qn,d,d 



January 20, 2013 



DRAFT 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (REVISED) 



13 



and Q n x has entries [Q,, 1 ]*; fc' := q n ,k,k'- The solution to (fT4l ) is given by 



o 



(J) 

x,n,d 



7 0) d 



A 



x,n,d 



S1 § n I7?i,d 



where := max(x,0) and sign(-) denotes the sign operator. 
A similar closed-form solution becomes available for (II let , since 



So) . 



oT. L w '■= arg mm - 

y ' n ' d o y , n , d 2 



for every d = 1, . . . , D y and n = 1, 



v — H x^) 



O 



00 

y,n,l:d— 1 



J y,n,i 



O 



Cj-i) 

j/,n,d+l:Dj, 



iV. 



Problem (031 ) can be alternatively written as 



(15) 



where 



7°' } „ == 
' y> n ,d 



[ a jfn,d ^ r n,k,d°vli.k ^ r n,k,d°y,n,k 1 



d-1 



fn.d.d 



fc=l 



„(.?') — R-i f v _h x (i) 



fc=d+l 
^■y,n,d := ^-y / r n,d,d 



(16) 



and R n x has entries [R n 1 ]fcfe' := r n /.k'- The solution to (fT4l ) is given by 



\n,d 



ly,n,d 



^y,n,d 



sign 7 



iy,n,d 



Global convergence of the (IT2li-(fT5b iterates is guaranteed from the results in ||38l , as summarized next. 
Proposition 4. For any initial values y^°\ o^\o^\ the iterates in (1121 ), (1131 ) and (1151 ) are a/Z convergent. 
Furthermore, every limit point of the sequences x^, , Oj, solves ©. 

Note that ([121 ) contains the bulk of computation per iteration j, and its complexity is equivalent to 
that of KS, which is linear in N. This should be contrasted with the general purpose convex solvers 
whose complexity is polynomial in N (worst-case of order C9(iV 3 5 ); see e.g., JH). As mentioned 
earlier, the complexity reduction is due to the unique properties of Lasso-related problems, namely 
variable separability, closed-form thresholding per variable, and warm starts. Coordinate descent solvers 
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capitalize on these properties, and have been documented to outperform competing alternatives, including 
off-the-shelf QP solvers 1151. lfl6l. BUI. 

Remark 3. This section's efficient solvers of the ^i-norm based convex cost in © will converge to 
estimates generally not coinciding with the global optimum of the ultimate ^o- norm based sparsity- 
promoting cost in (|5). This motivates concave regularization terms, which offer improved approximations 
of the ^ - norm relative to that offered by the ^4 -norm ||25l . One such alternative leads to solving 

1 N D x N D y 

[x,6] := arg min — ]|y — Ax-o||^_i + ^ ^log^^l + 8 X ) + X y ^ ^log(|o yin>(i | + S y ) (17) 

n=l d=l n=l d=l 

where 8 X (5 y ) are small positive constants to ensure that the argument of the logarithm stays away from 
zero. Since the cost in (fTTT l is non-convex, it is recommended to initialize its iterative minimization with 
the efficient convex solver of ©. Starting with such an initialization (x(°),o(°)), the logarithm can be 
successively linearized around the l-th iterate using log(i+<5) « log(t®+5) + (t-t®)/(t®+8) to arrive 
at a convex cost, which can be readily optimized to obtain the estimates at iteration (7 + 1). Specifically, 
at iteration / one solves 

1 N N D y 

[x«,o {0 ] := arg min -||y - Ax - 0^1 + X x Y.Yj W Sn,d\ ^ + A ^ H w y]n,d\°v^ d \ ( 18 ) 

n=l d=l n=l d=l 

where 

W x,n,d ■— y\°x,n,d\ ^ °x) ' W y,n,d ■— y\°y,n,d\^ °y) 

Note that (fT8T ) is similar to the DRS one in ((6]) except that the entries of vector o in the regularization are 
weighted non-uniformly. Being convex, (fT8T ) can be solved as easily as ((6]). With reliable initialization 
offered by the solution of ©, one reason behind the enhancement offered by (TLST i is the bias correction 
to Lasso, which is known to yield reliable estimates of the o support but biased estimates of its nonzero 
entries Ell . Besides (fLSt . alternative means to mitigate such bias effects is to retain only the outlier-free 
measurements, after identifying them through the zero entries of o, and use them to re-run the clairvoyant 
KS which is unbiased. The improvement offered by these refined estimates and those obtained by solving 
(fLST ) will be corroborated via simulated tests. 

VI. Fixed-lag DRS for online operation 

The major limitation of fixed-interval smoothing is that the whole batch {y n }^Li has to be available 
prior to estimating {x n }^ =1 . This is useful for applications such as processing electroencephalograms 
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061 . but not for target tracking. In many tracking applications, state smoothing has to be performed 
online and stringent delay ("lag") constraints are imposed between the smoothed state instant and the 
time state estimates are formed. Online smoothing is also important in applications where the state is 
affected by abrupt changes since these events may be the manifestation of, e.g., system failures ll28l . 

When the outliers are absent in CO, optimal fixed-lag KS can be regarded as a special case of fixed- 
interval KS HI p. 176]. The goal here is to estimate x n , relying upon observations up to time n + I, 
where t denotes the estimation lag. Supposing that a KF has been run up to time n to yield the state 

and covariance estimates x n | n and S n | n , fixed-lag KS can be accomplished using 

f n+t 

^KS ■ J \ > I. -^j || 2 || n2 

X n:n+f = argmin \ ~X H^™' ~~ "-n'^-n 1 + 2~H Xn ~ x n\n\\-£-i 

\ ?t'=n+l 

1 n+e ~\ 

+ 2 2 H Xn ' - F n'Xn'-l||Q-i > (19) 
n'=n+l ) 

where x^ s n+£ := [(x^ s ) T , . . . , (xJ^) T ] T . Observe that fixed-lag KS in (fl9l) is a special case of fixed- 
interval KS, when the initial condition on the state, namely x n | n and S n | n , is given by the KF, and 
the state is smoothed over the interval [n,n + £]. Thus, the solution of ( fl9l ) can be found with either 
one of the two algorithms of Section II. However, since (fl9l ) does not account for outliers, the resulting 
estimator is not robust. To address this issue, a fixed-lag DRS is developed next. 

A. Fixed-lag DRS 

In the previous section, the fixed-lag KS was regarded as a special case of the fixed-interval KS with 
properly chosen smoothing interval and initial conditions. Furthermore, in Section |in] a fixed-interval 
DRS was developed, which is extended here to robustify the fixed-lag KS in dT9l) . Mimicking the steps 
followed in Section [III] to robustify ( fl9l ) is challenged by the fact that the initial conditions x n | n and 
E„| n are evaluated by the clairvoyant (and thus non-robust) KF. To overcome this obstacle, the fixed-lag 
KS will be recast in a form entailing the interval [n — w, n + £]; that is, 



n+i 



1 



- KS ■ I \ "> TT 

x n-w.n+e = argmm <J - ^ Un' ~ tln' x n'\\ R -} + ^\\ x n-w - x. 



X 



9 / j \\J n' -'Mr || j^- 1 i 2 II ""Ml— V) -Mi— w\n— wll^- 1 

V n'=n— w+1 

^ n+e \ 
+ - Yl K*' -Fn'Xn'-iUjj-i >. (20) 
n'=n— w+1 ) 
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The formulation in (l20l l is equivalent to that of (fl9T l in the sense that x^ s = xjp. It also suggests that 
the fixed-lag KS estimate at time n and lag £ can also be obtained by initializing its recursions with the 
KF estimates x n „ tu |„_ u , and Y, n _ w \ n _ w for arbitrary w. The fixed-lag DRS is obtained by robustifying 
the fixed-lag KS d20l) in a fashion similar to that used in Section [TTTJ that is, 

r - DRS x i _ 

L n— w.n+ii ,J x,n—w.n+li u y,n—w:n+£l 

fl n+£ 1 

min s — / 



arg min i / l|yn' H^Xu' Oy,n'||j^-i „|| x n— «> x n— tu|n— u> lis -1 

X,0;p ,Oy I ^ n' £ n — w\n — w 

\ n'=n—w+l 

+ 9 ll x n' — F n 'X n /_i — o Xjn / + ^ [Aj/lloj/^'lji + A^llo-c^'Hi] > .(21) 



2 

n'=n— w+1 n'=n—w+l ) 

Observe that eventual errors in y^ n _ w \ n _ w and S n _ ltJ | n _ lu due to the non-robust KF do not severely affect 
the estimates at time n provided that w is sufficiently large. Certainly, the larger the w, the larger the 
number of nuisance variables involved. 

The major limitation of the fixed-lag DRS in (|2TI ) is that a convex optimization problem has to be 
solved at each time n to obtain x^ RS . As a consequence, the associated computational burden to solve 
the fixed-lag DRS in (|2T1) is not comparable with that of the standard fixed-lag KS. This motivates 
approximating the fixed-lag approach in (1211 to enable online DRS at complexity comparable to that of 
standard fixed-lag KS and state-of-the-art non-linear smoothers. 

B. Online fixed-lag DRS 

The coordinate descent-based fixed-interval algorithm in Section|V]is properly modified in this section 
in order to solve the fixed-lag DRS problem formulated in (l2T1 i. Despite the fact that convergence to a 
solution of (|2T1) is provably guaranteed asymptotically (i.e., for infinite iterations), satisfactory estimates 
can be obtained with only a few coordinate descent iterations. 

Suppose that between two consecutive observations (say n + £ and n + £ + 1), the affordable delay 
allows for J coordinate descent iterations to be implemented. Furthermore, for a limited number of 
iterations, initializing with o^^.^^ and Oy n _ w . n+i close to their global optimum values provides a 
"warm start-up" considerably improving the performance. Observe that for estimating the state at time n, 
fixed-lag DRS entails smoothing over the interval [n — w,n + £], and after J coordinate descent iterations, 
the variables x^l w , n+l , o ( x J) n _ w . n+i , o { y J) n _ w . n+e become available. Since fixed-lag DRS at time n + 1 
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entails smoothing over the interval [n - w + 1, n + £ + 1], the variables o^ n _ w+1 . n+t o^ n _ w+1 . n+t 
can be initialized to . e and . c obtained from the previous J iterations, which provides 

the aforementioned warm start-up. Granted, when the window w is smaller, the effect of the non-robust 
initialization is more pronounced. Even though no analytical results are claimed on the performance 
as a function of w, the simulated RMSE comparisons in Section VIII with competing alternatives of 
comparable complexity, speak for the merits of this section's online algorithm. 

The novel fixed-lag DRS scheme amounts to sequentially running J KS's and combining their outputs. 
Interestingly, several non-linear smoothers including those based on SMC and IMM approaches also 
combine the outputs of several fixed-lag KS's, which allows for a fair comparison of these techniques. 

VII. Generalized linear state-space model 
Consider the more general linear state-space model, given by [cf. ([Tab l 

x n = F n x n _i + G n w n + o x>n , Vn = 1, . . . , N. (22) 

where {G n }^ =1 are known matrices. If matrix G n is tall, G n G^ is rank deficient, which prevents one 
from writing the WLS state error as in (@}. Instead, KS can be formulated as a constrained optimization 
problem, and likewise for the corresponding fixed-interval and fixed-lag DRS. Specifically, the novel 
fixed-interval DRS can be obtained as 

[x^^w^Oy] := arg min Cy u (x, w, o x , o y ) 

subject to Xn = F n x n _i + G n w n + o x>n , V n = 1, . . . , N (23) 

where w := [w^, w|\ . . . , w^] T and 

1 N 9 1 1 r 

n=l n=l 
N 

+ ^^[AgllOa^lll + \/ll°y,n||l]- 

n=l 

Due to the constrained nature of the problem in (|23l , coordinate descent iterations can not be directly 



applied. However, it is possible to develop iterations based on the alternating direction method of 
multipliers (AD-MoM) [7]. These iterations are simple if one introduces the auxiliary variables a n = o xn 
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and b n = Oj, n which imply additional constraints. Then, the augmented Lagrangian can be written as 
1 r 2 1 1 r 2 r 

n=l n=l n=l 



+ ^2 l*n( x n ~ F n X„_i - G„W n - O^n) + - ||x„ - F n X n _i - G n W n - O x , n \\?\ 
n=l 

N N 

+ [^n(°V,n ~ b «) + 2 " 0?/ ' n ~ b ""2] + ^2 [ U n(°x,n ~ »n) + T^IK.n ~ a n |||] . (24) 
n=l n=l 

where {x n , /x„, fn}^=i denote the Lagrange multipliers and k is a positive constant. Setting the 
derivatives of C K with respect to x n equal to zero, yields the following AD-MoM iteration [cf. (l24b l 



\l N _ 1 

x (j) = argmm< -J^||y n -H n x n -o^ n 1} |j^-i + -||x - m ||^-i 

X I n=l 



N 

■J2 \ \W - ?n*n-l ~ Gn^- l) ~ eg" 1 ) + X^ I '«l 
n=l 



Clearly, this problem is equivalent to ©, which can be solved in a batch or recursive form at complexity 
that is linear in N. Likewise, the remaining variables are updated as follows: 

wg' } = (Q- 1 + KGlG^GKxt^ + «x« - «F n x^! - ^og- 1 )), n = l,...,N 



y,n 


= (R" 1 + Kl^r^R- 1 ^ - H n xg)) - ^■~ 1 ) T + K bg- 


n = l, 


• • • J 


A r 




o(i) 

x,n 


= 5 (x^V* + 4P ~ Fnxjli - G„w£> + aO" 1 ) - ^ 




1,. 






b®, 

n,d 


= ^axd^ + ^l - A B ,0)sign(« O J;i 1<1 + ^ 1) ) 


, n = 1, . . . 


,iV, 


d= 1,.. 


.,£>„ 


a ij) 

u n,d 


= £ ™ + - X x , 0)sign( KO ^ id + ^f) 


, n= 1,... 


,A r , 


d = 1,.. 




A.n 


= ^-i) +K (xW-F n x£ 1 -G n w^-og) 1 ) ; n = l,. 


..,JV 










= ^-i)+«(og^-bC?')), n = l,...,A 










U U) 


= ^-D+^-aW), n = l,...,N. 











Invoking the results in (7J p. 256], guarantees global convergence of these iterations as asserted next. 
Proposition 5. For any k > and arbitrary initial values w(°), o y , oi , b(°), a(°), /i^ ^ , f/ie 
AD-MoM iterates are all convergent. Furthermore, every limit point of the sequences y&\ Oy\ 
Ox i s a solution of the problem in (l23l) . 
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Although global convergence of the coordinate descent and AD-MoM iterates is ensured by Propositions 
4 and 5, respectively, no analytical results are available in optimization theory regarding their rate of 
convergence - a challenging subject going well beyond the scope of the present work. 

For AD-MoM iterations too, the bulk of computations is in the order of a KS, which grows linearly in 
N. The rest involves closed-form evaluations. In a nutshell, for the general linear model in (l22l the AD- 
MoM iterations replace those of the coordinate descent algorithm with the same order of computational 
complexity. Some of the simulations in the ensuing section will test this AD-MoM based fixed-interval 
DRS approach, which also has a fixed-lag counterpart tailored for online operation under the general 
linear state-space model. Its derivation follows closely that of the coordinate descent for fixed-interval 
DRS, and is omitted for brevity. 

VIII. Simulated tests: Maneuvering target tracking with glint 

In this section, the developed robust smoothers are simulated for maneuvering target tracking in the 
presence of glint noise. First, DRS performance is tested on a sample target trajectory, and then sample 
averaged performance metrics for DRS are compared against the main competing alternatives. 



A. DRS on a Sample Trajectory 

The model in (|22l is simulated with x n := [p%, s^,Pn, Sn] T , where p^ and p v n denote the target position 
in the x and y coordinates, respectively; and correspondingly sf t and Sn denote the target velocity in the 
x and y directions; thus, D x = 4. The matrices in ([Tbl and (l22l are invariant V n 



i 1 t \ 



10 
1 r 
1 



/i! 

2 u 



G n :- 



\ 



( 



10 
10 



(25) 



/ 



r 
£ 

T 

and r denotes the sampling period. Since G n is tall, this so-termed discrete white noise acceleration 
(DWNA) model @ p. 273], can only be handled by the generalized linear state-space model of Section 
VII. The form of H n in d2"5K shows that y n comprises noisy position measurements, and D y = 2. 

A total of N = 100 observations are collected, r = 1, R n = 150 2 l2, Q n = O.5I2, mo = O4, and 
Sq = diag(50, 5, 50, 5). The target trajectory starts from position [0,0], and evolves according to the 
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DWNA model with the specified parameters from time n = 1 to n = 29. At times n = 30 and 31 
the target turns right and follows again the DWNA model from n = 32 to 59. At time n = 60 and 61 
the target turns left and then proceeds with the DWNA model until n = N. The true target trajectory 
is depicted in Fig. [2] with solid line. The circles represent the acquired position measurements. Three 
outliers (not depicted in the figure) yield erroneous position reports, at n = 15, 50, and 80. 

Figure [3] depicts the clairvoyant fixed-interval KS estimate. Observe that KS is not robust to outliers in 
the observations and state. The fixed-interval DRS estimates shown in Fig.|4]for X y = 0.01, and X x = 0.05 
(which approximately correspond to 10% of the critical X y and X x in Proposition 3), demonstrate that 
DRS can effectively cope with outliers, and has merits over the non-robust KS even when (X x ,X y ) are 
not systematically estimated as in © or (fTOl ). Figures [5] and [6] depict estimates of the fixed-lag KS in 
(fT9l ), and DRS in (|2TT > for lag t = 10, respectively. Again, the KS estimates are strongly affected by 
outliers. On the other hand, the fixed-lag DRS estimates in Fig. [6) for w = I, X y = 0.01, and X x = 0.05, 
are only minimally affected by outliers. 

B. Online Fixed-lag DRS versus Rao-Blackwellized SMC 

The root mean-square error (RMSE) of the position estimates is used here to quantify the performance 
improvement of DRS relative to KS. The true target trajectory coincides with that of Fig. [2] (solid line), 
when M = 100 noise and outlier realizations are present. With probability -k = 0.97, the model in (fTbl 
was in effect with o y = 0, H„ in (|25T ). and R n = 150 2 l2- With probability 1 — -k = 0.03, outliers in the 
observations occur, and in this case the position reports are [y n ,liVn,2\ ~ 10000, 10000] 2 ). Figure 
H depicts the RMSE of the position estimates, RMSE n = yj jj Y^=i - ) 2 + (pK ~ 9^ ) 2 ], 
where Ip^"^ ,pt ] is the estimated position at time n for the mth noise and outlier realization, for the 
fixed-interval KS and DRS with X y = 0.01, and X x = 0.05. Clearly, DRS exhibits lower RMSE than the 
clairvoyant KS. 

Figure [8] depicts the instantaneous RMSE for fixed-lag KS and DRS for £ = 10, w = 10, = 0.01, 
and X x = 0.05, along with the Rao-Blackwellized (RB) SMC smoother relying on 50 particles, and the 
online fixed-lag DRS with 50 AD-MoM iterations and constant k = 0.05 (referred in the figure as O- 
DRS). For the RB-SMC smoother, a conditionally linear, Gaussian model is adopted. Specifically, under 
nominal conditions, the model is that in (Q} with F n , G n , and H n as in (1251 ). o = 0, Q n = O.5I2, and 
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R n = 150 2 l2- It is assumed that the nominal conditions are in effect with probability -k = 0.97. Outliers 
in the measurements occur with probability 1 — ir = 0.03, and in this case R n = 15000 2 I 2 , which allows 
for down-weighting the respective measurements. With the same probability, state outliers emerge too, 
and in this case Q n = 50012- Clearly, conditioned on the outlier realizations, the dynamical process is 
linear and Gaussian. This allows for drawing particles for the state/measurement outliers, and using them 
for estimating the state via fixed-lag KS; see also lfT3l for details on the fixed-lag RB-SMC. In addition 
to fixed-lag KS, the novel O-DRS approach outperforms RB-SMC for the same computational burden. 

C. Comparison with RANSAC and Huber M-estimates 

DRS is compared here against state-of-the-art robust smoothers, namely the Huber based scheme, and 
a combined RANSAC followed by Huber scheme. In the latter, smoothing is cast as the linear regression 
problem in © to which RANSAC can be applied readily EDI . RANSAC relies on random draws (here 
100 or 1,000) to find the "best" possible subset of rows corresponding to the nominal model |[T4l . EDI . 
To ensure that the remaining outliers do not degrade performance, the nominal rows of © found by 
RANSAC are pre-whitened, and subsequently plugged into Huber's cost in ©. The Huber parameters 
are set to X x = \ y = 1.345 as suggested by ifTTl . E3l for standardized Gaussian nominal noise (this 
requires pre-whitening the nominal noise). The Huber estimate is found by solving © using the iteratively 
re- weighted least-squares (IRLS) algorithm in E3l . which unlike the Lasso-based solver pursued here, 
guarantees only local convergence. The fixed-interval DRS in © is also employed with X x and X y found 
using either of the two data-driven criteria suggested in Section IV. To further improve DRS, one iteration 
of the refined estimate in Remark 3 is also implemented. The model simulated here obeys E5T l. but with 
G n = I4, w n ~ AA(04,Q n ), Q n = diag(l, 0.001, 1, 0.001), and R n = 5I2. State and measurement 
outliers are generated as independent Laplacian with variances 200 and 20,000, respectively. The RMSE 
for both position and velocity estimates time-averaged over 100 Monte Carlo runs is plotted versus the 
percentage of outlier contamination. 

Fig. |9]plots the RMSE versus percentage of outliers for the combined RANSAC-Huber robust smoother 
as well as DRS, when outliers appear only in the measurements. The numerical suffix for DRS denotes 
the grid size used for the AVD [cf. ([Toll, while the one for RANSAC stands for the number of RANSAC's 
random draws. In terms of complexity, DRS- 100 (with I x = I y = 10 grid points equispaced in log scale 
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as suggested in 03) lies in-between RANSAC-100 and RANSAC-1000. Note that up to 50% outliers all 
three methods perform similarly. When the outlier contamination percentage exceeds 50%, RANSAC-100 
performs poorly, while RANSAC-1000 and DRS-100 exhibit graceful performance degradation. Due to 
its lower-complexity, DRS-100 offers a better alternative than RANSAC-1000. 

Fig. [10] compares RMSE performance when outliers are only present in the state. One observes that 
DRS-100 considerably outperforms both versions of RANSAC for all percentages of outlier contamina- 
tion. The improvement in going from RANSAC-100 to RANSAC-1000 is not noticeable. 

Fig. [UJplots the RMSE resulting from DRS-100, batch KS, Huber-only, and the combined RANSAC- 
Huber scheme when outliers are simultaneously present in the state and measurements. The AVD criterion 
is used for DRS. It can be seen that RANSAC-Huber combination performs poorly for outliers present 
in the state and measurement. This happens because RANSAC removes certain rows of the regression 
matrix, namely those contaminated by outliers, which renders the remaining sub-matrix ill-conditioned. 
Huber-only performs close but worse than KS - a manifestation of the fact that Huber's estimate are 
found for independent nominal noise. Even though the noise here is independent, it is not standard 
Gaussian and the subsequent pre-whitening, which is a mere scaling in this case, adversely affects the 
Huber-based estimate. Indeed, neither of the mentioned robust methods performs well when outliers are 
present both in the state and measurement, and surprisingly even the clairvoyant KS outperforms them. 
However, DRS-100 outperforms KS in terms of RMSE, and speaks for the importance of the universality 
property of the novel estimator. 

The DRS improvement over KS is more pronounced if the percentage of outliers is known, case where 
© is used instead of AVD. The result is plotted in Fig. Q21 where DRS significantly outperforms KS. 
Here, the percentage of state outliers is fixed at 10%, while that of measurement outliers is variable. 

At last, different DRS renditions are compared against each other and with the robust smoother of 
j3l . For a fair comparison with |]3), the setup includes outliers only in the measurements and smoothed 
estimates for both approaches are formed using the general-purpose optimization software SeDuMi ll34l . 
Each randomly occurring outlying-measurement is drawn from a zero-mean uniform distribution with 
variance 20, 000 independently from the nominal random variables. Note that the outlying measurements 
here are generated not in accordance with the model in fl3] in order to illustrate the universality attribute 
of the proposed DRS. Nominal model parameters commonly known to both DRS and the smoother 
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l3l . are chosen as before with the only difference that Q n = diag(100, 1, 100, 1). Fig. [13] depicts the 
mean and median RMSE computed over 1,000 Monte Carlo runs as a function of the percentage of 
outliers. It can be seen that DRS with AVD outperforms the smoother in Q, especially as the fraction of 
outlier contamination exceeds 10%. Similar to the smoother in 0, DRS with AVD utilizes only nominal 
parameter knowledge to form its estimate. The curve that utilizes the concave penalty pertains to the 
refinement outlined in Remark 3. A clear gain is observed with this refinement as a result of de-biasing 
the DRS estimates. DRS with known percentage of outliers is also plotted and outperforms all other 
alternatives. This is due to the extra knowledge on outlier sparsity that this smoother benefits from. 



Robust smoothers were developed for dynamical processes contaminated with outliers in the observa- 
tions and/or state. The novel fixed-interval DRS can be viewed as an £i-norm regularized version of the 
WLS-based clairvoyant KS algorithm. This form of regularization controls the sparsity of outliers, which 
are explicitly introduced as auxiliary variables. Two data-driven methods were also devised to select the 
associated regularization parameters. Block coordinate descent-based iterations were developed to solve 
the underlying convex optimization problem in an efficient manner. To enable real-time smoothing for 
delay-constrained applications such as target tracking, an online fixed-lag DRS was also developed. At 
last, the novel approach was broadened to include generalized linear state-space models. Numerical tests 
demonstrated that the proposed algorithms can jointly cope with state and measurement outliers, and 
outperform state-of-the-art methods at comparable computational burden. 

Appendix 

A. Proof of Proposition 1 (MAP optimality of DRS in 

Successive application of B ayes' rule, as well as the assumptions on independence and the correspond- 
ing distributions of the nominal noise and outlier vectors yield 

j>\x.,o x ,o y \yi:N) = t s — — oc p(x ) p(y n x 0: „,yi : n-i, o xl:n ,o y 1:n ) 

Xp(Xn| x 0:n-l>yi:n-l>°a:,l:n! Oy t i :n )p(Oy in \xo :n -i,yi :n -i,0 Xl i :n) O y> i :n -i) 



IX. Conclusions 



N 



Xp(o x ,n| x 0:n-1) yi:n-l> °a;,l:ri-l) °2/,l:n-l 



) = p( x o) n p(y 



n 



O x ,n)p(Oy,n)p(o Xtn ) 



n=l 
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= A/"(x ;mo,S ) 

n=l 

where A/"(x; mo, So) represents a Gaussian distribution with mean mo and covariance So, while £(o; A) := 
[Id (A/2) exp(— X\od\) oc exp(— A||o||i) represents the joint Laplacian distribution for a vector with 
independent entries. Maximizing d26l ) amounts to minimizing the negative of the exponent, which leads 
to the DRS criterion in ©. 

B. Proof of Proposition 2 (Equivalence of © with (f7])) 

Consider minimizing the cost in © over o y and o x , with x fixed. Given x and with {Q n ,R n }^ =1 
and So given by identity matrices, the criterion in © is separable over each scalar entry of o y and o x . 
Hence, it suffices to find 

o y ,n,d = arg min { \{y n 4 - h^ rf x„ - o y ^ d ) 2 + \y\o y , n , d \ \,n = 1, . . . , N, d = 1, . . . ,D y (27a) 
o x , n ,d = arg min j \{x n ^ d - f^ d x n „i - o Xtn4 ) 2 + X x \o x , n4 \ \,n = 1, . . . ,N, d = 1, . . . , D x (21b) 
The scalar problems in (l2"7T i admit, respectively, the following closed-form solutions (see, e.g., 11291 ): 

if \y n ,d ~ h^ d x n | < X y 



Oy,n,d 



0. 



Vn,d ~ K d x n ~ X y sign(y M - d x n ), otherwise 



O x n,d 



0, if \x ntd - f^ d x n _i| < A a 

x n,d ~ f ^d x n-i - X x sign(x M - f£ d x n _i), otherwise 
Substituting d28l ) in the DRS cost of ((6]), the subsequent optimization problem in x is 



(28a) 



(28b) 



x 



arg mm 

X 



N Dy 



I 1 

Z, S ( ~ h M X «) 2]l l^,.-h^ d x„|<A !/ (x) 

n=ld=l ^ 

+ (X y \y n ,d ~ hLx n | - X y /2)l l y rtd _ h T dXnl>Xy (x) S J + ^ Q(z ,d - rn Q)d j- 



N D w 



n=l d=l ^ 

+ {X x \x n>d - f^ d x n _i| - A^/2)l| a , n (i _ f T tiXn _ i | >Aa; (x)^ . (29) 

Given the definition of Huber's cost, the problem in (1291 is equivalent to £7]). Therefore, © and © are 
equivalent. 
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C. Proof of Proposition 3 

Letting L(x, o x , o^) denote the cost in (_} and V the subgradient operator, the optimality conditions 
for the non-differentiable problem in (_) are 11321 p. 126] 

€ Vo v>n L(±,6 x ,6 y ) ^ Oe R-n l6 y,n - R-n^Yn - H n x n ) + X y 6 y>n (30a) 

£ V 0aJt „^(x, 6 Z , 6 y ) ^ 0£ Qn lo x,n - Qn H^n ~ F n X n _i) + \ x 6 x , n (30b) 

where 6 x>n := [o^i, 6^2, . . . , d x ^ D:c ] T and 6 y , n := [o W)Tl) i, 6 yin>2 , . . . , d y ^ )Dy \ T are the subgradients 
°f 1 1 °x,n ||i and Ho^nlli, respectively, whose dth entries are given by 

sign(oa; n d ), o x n d ^ 



Oy,n,d * 



sign(6 ?/ 

j Ox,n,d * 

Sn,di O y ,n,d — 



tn,di 6x,n,d — 



for any |s n , d | < 1 and \t n4 \ < 1. 

DRS coincides with KS when 6 y = 0nd v and 6 X = 0nd t , which implies x := x KS . For 6 y = 0nd v , 
(I30ab is satisfied with x := x KS if and only if (l8ab holds. Similarly, for 6 X = 0nd„, (I30bl) is satisfied 
with x := x KS if and only if ([80) holds. QED 
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Fig. 1: Quadratic cost versus Huber cost (A = 2). 
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Fig. 2: True target trajectory (solid line); Observed 
positions (circles). The squares indicate the tra- 
jectory instants where outliers occur (n = 15, 50, 
and 80). Outlier-corrupted measurement values are 

yi 5 = [-5560,18440] T , y 50 = [3880, 14440] T , 
and y 80 = [6440, -14800] T . 
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Fig. 3: True target trajectory (solid line) and esti- 
mated trajectory (circles) using fixed-interval KS. 
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Fig. 4: True target trajectory (solid line) and esti- 
mated trajectory (circles) using fixed-interval DRS 
(X y = 0.01, Ax = 0.05). 
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Fig. 5: True target trajectory (solid line) and esti- 
mated trajectory (circles) using fixed-lag KS. 
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Fig. 6: True target trajectory (solid line) and 
estimated trajectory (circle) using fixed-lag DRS 
(A y = 0.01, = 0.05). 
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Fig. 7: RMSE analysis of the fixed-interval KS 
versus DRS (X y = 0.01, X x = 0.05). 
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Fig. 8: RMSE analysis of the fixed-lag KS versus 
DRS (\ y = 0.01, Aa, = 0.05), online DRS (k > 0, 
0.05, J = 50 AD-MoM 
iterations), and Rao-Blackwellized SMC smoother 
(50 particles). 
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Fig. 9: Mean RMSE ± std. deviation for estimates 
formed by RANSAC followed by Huber robustifi- 
cation versus DRS: Measurement outliers only. 
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Fig. 10: Mean RMSE ± std. deviation for estimates 
formed by RANSAC followed by Huber robustifi- 
cation versus DRS: State outliers only. 
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Fig. 11: Outliers present in state and measure- 
ments. 
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Fig. 12: DRS versus LS with known percentage of 
outliers. 
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Fig. 13: Comparison among different DRS renditions with the smoother in [3]: (left) Mean RMSE ± 
std. deviation; (right) Median RMSE. 
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